Chip Refinement Character Recognition Text Clean - up I 2 Segmentation Texture Segmentation Texture Segmentation Texture Segmentation Texture Generation
نویسندگان
چکیده
There are many applications in which the automatic detection and recognition of text embedded in images is useful. These applications include multimedia systems, digital libraries, and Geographical Information Systems. When machine generated text is printed against clean backgrounds, it can be converted to a computer readble form (ASCII) using current Optical Character Recognition (OCR) technology. However, text is often printed against shaded or textured backgrounds or is embedded in images. Examples include maps, advertisements, photographs, videos and stock certiicates. Current OCR and other document segmentation and recognition technologies cannot handle these situations well. In this paper, a system that automatically detects and extracts text in images is proposed. This system consists of four phases. First, by treating text as a distinctive texture, a texture segmentation scheme is used to focus attention on regions where it may occur. Second, strokes are extracted from the segmented text regions. Using reasonable heuristics on text strings, such as height similarity, spacing and alignment, the extracted strokes are then processed to form tight rectangular bounding boxes around the corresponding text strings. To detect text over a wide range of font sizes, the above steps are rst applied to a pyramid of images generated from the input image, and then the boxes formed at each resolution of the pyramid are fused at the original resolution. Third, an algorithm which cleans up the background and binarizes the detected text is applied to extract the text from the regions enclosed by the bounding boxes in the input image. Finally, text bounding boxes are reened (regenerated) by using the extracted items as strokes. These new boxes usually bound text strings better. The clean-up and binarization process is then carried out on the regions in the input image bounded by the boxes to extract cleaner text. The extracted text can then be passed through a commercial OCR engine for recognition if the text is of an OCR-recognizable font. Experimental results show that the algorithms work well on images from a wide variety of sources, including newspapers, magazines, printed advertisements, photographs, digitized video frames, and checks. The system is also stable and robust|the system parameters work for all the experiments.
منابع مشابه
Unsupervised Texture Image Segmentation Using MRFEM Framework
Texture image analysis is one of the most important working realms of image processing in medical sciences and industry. Up to present, different approaches have been proposed for segmentation of texture images. In this paper, we offered unsupervised texture image segmentation based on Markov Random Field (MRF) model. First, we used Gabor filter with different parameters’ (frequency, orientatio...
متن کاملUnsupervised Texture Image Segmentation Using MRFEM Framework
Texture image analysis is one of the most important working realms of image processing in medical sciences and industry. Up to present, different approaches have been proposed for segmentation of texture images. In this paper, we offered unsupervised texture image segmentation based on Markov Random Field (MRF) model. First, we used Gabor filter with different parameters’ (frequency, orientatio...
متن کاملColor Image Segmentation using Fuzzy Local Texture Patterns
Texture is one of the fundamental image characteristics useful in computer vision tasks such as object recognition and scene analysis. Texture segmentation is one of the image analysis tasks. The prospect of texture segmentation depends on the choice of the texture description method and the segmentation procedure. In this paper, color-texture descriptors are proposed to represent the texture c...
متن کاملClassification of Endometrial Images for Aiding the Diagnosis of Hyperplasia Using Logarithmic Gabor Wavelet
Introduction: The process of discriminating among benign and malignant hyperplasia begun with subjective methods using light microscopy and is now being continued with computerized morphometrical analysis requiring some features. One of the main features called Volume Percentage of Stroma (VPS) is obtained by calculating the percentage of stroma texture. Currently, this feature is calculated ...
متن کاملTextFinder: An Automatic System to Detect and Recognize Text In Images
ÐA robust system is proposed to automatically detect and extract text in images from different sources, including video, newspapers, advertisements, stock certificates, photographs, and checks. Text is first detected using multiscale texture segmentation and spatial cohesion constraints, then cleaned up and extracted using a histogram-based binarization algorithm. An automatic performance evalu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997